-
Notifications
You must be signed in to change notification settings - Fork 202
[Transform] QuIP Modifier #1648
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Brian Dellabetta <[email protected]>
Signed-off-by: Brian Dellabetta <[email protected]>
Signed-off-by: Brian Dellabetta <[email protected]>
Signed-off-by: Brian Dellabetta <[email protected]>
Signed-off-by: Brian Dellabetta <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
7805321
to
ac7dbcd
Compare
Evals on Llama 3.2 1B with Quip (num_fewshot 8, limit 1000 to be compatible with results here) :
|
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
beautiful
Signed-off-by: Kyle Sayers <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very nice - can we fix the example?
@@ -27,5 +28,9 @@ def __call__( | |||
:param dataloader: loads data for calibration | |||
:param dataset_args: dataset arguments relevant to pipelines | |||
""" | |||
# some ops are still performed on the model by modifiers | |||
# we want those ops to occur on the GPU | |||
dispatch_for_generation(model) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That means we can leverage more than one gpu for data free cases, including weight-only RTN schemes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically yes, although the weights are still calibrated in a synchronous for
loop, so there's no speedup gained from the extra gpus
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks great! just a couple comment/docstring requests
|
||
# Configure the quantization algorithm to run. | ||
# * apply spinquant transforms to model in order to make quantization easier | ||
# * quantize the weights to 4 bit with GPTQ with a group size 128 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to update this in the spinquant example as well
# * quantize the weights to 4 bit with GPTQ with a group size 128 | |
# * quantize the weights to 4 bit and group size 128 |
|
||
QuIP and QuIP# apply transforms to every linear layer, two of which are fused into | ||
the model weights and two of which remain as online rotations computed at runtime. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should add lifecycle here, can probably copy-paste from SpinQuantModifier
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Beautiful. Great work
targets="Linear", scheme="FP8_BLOCK", ignore=["lm_head", "re:.*mlp.gate$"], | ||
targets="Linear", | ||
scheme="FP8_BLOCK", | ||
ignore=["lm_head", "re:.*mlp.gate$"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this was fixed in a previous PR. Maybe rebase
|
||
# Configure the quantization algorithm to run. | ||
# * apply spinquant transforms to model in order to make quantization easier | ||
# * quantize the weights to 4 bit with GPTQ with a group size 128 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using GPTQ
should still be there
Purpose
Prerequisites
Changes
quip_example.py
to examples folderQuIPModifier
which handles the construction of a quip-style transform configTesting
Evaluation
Evaluation performed by @brian-dellabetta
Evals on Llama 3.2 1B with Quip (num_fewshot 8, limit 1000 to be compatible with results here) :
Follow Ups